Illuminating the Path
The Research and Development Agenda for Visual Analytics

Executive Summary CH4

Data Representations and Transformations (Chapter 4)

Visualization is intended to represent data and information in a way that can be acted upon by the analyst. The quality of the visualization is the most directly affected by the quality of the data representation that underlies the visualization.

Data must be transformed into a representation that is appropriate to the analytical task and appropriately conveys the important content of a large, complex, and dynamic collection. A data transformation is a computational procedure that converts between data representations. Data transformations are used to convert data into new, semantically meaningful forms. For example, linguistic analysis can be used to assign meaning to the words in a text document. Data transformations may be used to determine the optional way to display data, such as by creating a two-dimensional representation of data with hundreds or thousands of dimensions.

Transforming and representation data are complex for many reasons. The first issue is the sheer number of different types of data that may be analyzed: text in the form of short or long documents comprising many languages, numeric data form sensors, structured data from relational databases, audio and video, and image data. Each of these types of data may need to be transformed in different ways to facilitate visual analysis.

The massive scale and dynamic nature of data dictate that the transformations must be fast, flexible, and capable of operating at many levels of abstraction. Data are of varying levels of certainty and reliability, so these assessments of quality must be preserved and presented. Data of different types are often required to conduct an analysis, so it is very important to develop a data synthesis capability-a capability to bring data of different types together in a single environment so that analysts can concentrate on the meaning of data rather than on the form in which it was originally packaged.

Develop both theory and practice for transforming data into new scalable representations that faithfully represent the content of the underlying data.

From the standpoint of the analyst, border guard, or first responder, information provides guidance, insight, and support for assessments and decision. Our goal is to illuminate the potentially interesting content within the data so that users may discover important and unexpected information buried within massive volumes of data. Each type of data presents its own challenges for data representation and transformation. In most cases, data representations are not meant to replace the original data but to augment them by highlighting relevant nuggets of information to facilitate analysis.

We must develop mathematical transformations and representations that can scale to deal with vast amounts of data in a timely manner. These approaches must provide a high-fidelity representation of the true information content of the underlying data. They must support the need to analyze a problem at varying levels of abstraction and consider the same data from multiple viewpoints.

Data are dynamic and may be found in ever-growing collections or in streams that may never be stored. New representation methods are needed to accommodate the dynamic and sometimes transient nature of data. Transformation methods must include techniques to detect changes, anomalies, and emerging trends.

Methods exist at varying levels of maturity for transforming data. For example, there are a variety of methods for transforming the content of textual documents using either statistical or semantic approaches. Combining the strengths of these two approaches may greatly improve the results of the transformation.


Create methods to synthesize information of different types and from different sources into a unified data representation so that analysts, first responders, and border personnel may focus on the meaning of the data.

Complex analytical tasks require the user to bring together evidence from a variety of data types and sources, including text sources in multiple languages, audios, video, and sensor data. Today’s analytical tools generally require that the user consider data of different types separately. However, users need to be able to understand the meaning of their information and to consider all the evidence together, without being restricted by the type of data that the evidence originally came in. Furthermore, they need to be able to consider their information at different levels of abstraction.

Synthesis is essential to the analysis process. While it is related to the concept of data fusion, it entails much more than placing information of different types on a map display. The analytical insight required to meet homeland security missions requires the integration of relationships, transactions, images, and video at the true meaning level. While spatial elements may be displayed on map, the non-spatial information must be synthesized at the meaning level with that spatial information and presented to the user in a unified representation.


Develop methods and principles for representing data quality, reliability, and certainty measures throughout the data transformation and analysis process.

By nature, data are of varying quality, and most data have levels of uncertainty associated with them. Furthermore, the reliability of data may differ based on a number of factors, including the data source. As data are combined and transformed, the uncertainties may become magnified. These uncertainties may have profound effects on the analytical process and must be portrayed to users to inform their thinking. They will also make their own judgments of data quality, uncertainty, and reliability based upon their expertise. These judgments must be captured and incorporated as well. Furthermore, in this environment of constant change, assessments of data quality or uncertainty may be called into question at any time based on the existence of new and conflicting information.

The complexity of this problem will require algorithmic advances to address the establishment and maintenance of uncertainty measures at varying levels of data abstraction.












我们必须开发能够及时处理海量数据的数学转换和表示方法。这些方法必须为基础数据的真实信息内容提供高保真的表示(high-fidelity representation)。它们必须支持在不同的抽象层次上分析问题的需要,并从多个角度考虑同一数据。









